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Abstract 

Market segmentation of an online auction site is studied by analyzing the users' bidding behavior. 
The distribution of user activity is investigated and a network of bidders connected by common 
interest in individual articles is constructed. The network's cluster structure corresponds to the 
main user groups according to common interest, exhibiting hierarchy and overlap. Key feature of 
the analysis is its independence of any similarity measure between the articles offered on eBay, as 
such a measure would only introduce bias in the analysis. Results are compared to null models based 
on random networks and clusters are validated and interpreted using the taxonomic classifications 
of eBay categories. We find clear-cut and coherent interest profiles for the bidders in each cluster. 
The interest profiles of bidder groups are compared to the classification of articles actually bought 
by these users during the time span 6-9 months after the initial grouping. The interest profiles 
discovered remain stable, indicating typical interest profiles in society. Our results show how 
network theory can be applied successfully to problems of market segmentation and sociological 
milieu studies with sparse, high dimensional data. 



2 



Introduction 



The internet has changed the way people communicate, work, and do business. One example are 
online auction sites, the largest being eBay with its more than 150 million registered users world wide 
. An interesting aspect of eBay's success is its transparency. The market is fully transparent as the 
trading history of every user is disclosed to everyone on the internet. We here study the relationship 
between the participants of this market. 



Figure 1: Structure of a single auction. Users express their common interest in a particular article by 
bidding. The user with the highest bid wins the auction and exchanges money and the article with 
the seller. EBay earns a fee with every transaction. Users of the auction site, i.e. bidders, buyers or 
sellers, may change their role in a different auction of another article. 

Let us first recall the operating principle of an online auction in Figure ^ Users may offer goods 
through the online platform and set a deadline when their auction will end. Articles are listed under 
a certain taxonomic product category by the seller and are searchable platform wide. Users with a 
particular demand either browse through the articles listed in an appropriate category or search for 
articles directly. Until the end of the auction they may bid on the article. The user with the highest 
bid at the end of the auction wins (so called hard-close) and buys the article. In every new auction, 
users may assume different new roles as sellers, bidders or buyers. The market can be represented as 
a graph with the users and/or articles as the nodes and the links denoting their interactions as shown 
in Figure E 

A number of researches have presented statistical studies of trading [21 and analyses of bidding 
strategies and ending auction ending rules [Hill]- In this contribution we focus on the market segmen- 
tation of the eBay auction site. Our approach is based on the assumption that at a certain level of 
abstraction the population of consumers can be separated into relatively clear-cut and homogenous 
sub-groups corresponding to certain customer milieus or market segments We assume that cus- 
tomers of the same type are described by a common pattern in their consumer interests which leads 
to a higher probability of bidding for the same article [5] . 
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In particular, we perform a cluster analysis 0IEj of the bidding behavior of about one million users. 
Groups of eBay users with common interest or demand are detected using solely the information of 
which users competed in the same auctions. The classification is based on a very sparse and high 
dimensional data set [0] with only slightly more than 3 auctions per bidder on average (out of 1.6 
million possible auctions). Conventional analysis techniques such as correspondence analysis |5| I10j 
have to make use of a similarity measure between articles in order to reduce the dimensionality and 
coarse-grain the data, such as exploiting the annotation of articles into product categories. However, 
this bears several pitfalls: First, the annotations are defined by the seller who lists the article such that 
it can be found efficiently, hence, the categorization is mainly a taxonomy. Using this to coarse-grain 
the data would introduce a bias in the analysis. Second, eBay categories differ largely in size when 
counting the number of articles in the category as well as the number of sub-categories. Correcting 
for this again may introduce a bias. Third, using the category taxonomy for coarse-graining induces 
a hierarchy in the data, as all articles below the cut in the taxonomy tree are subsumed. Fourth 
and most importantly, it is not clear at which level in the category tree a coarse-graining should be 
performed and whether this level should be the same for all branches. 

Our analysis is independent of taxonomic categories and dimensionality reduction. It allows for 
hierarchical and overlapping cluster structures, and we find evidence for both. The product categories 
are solely used to interpret the results of our study, i.e. provide interest profiles of user groups found 
in terms of this taxonomy. 

By clustering users directly according to a common demand spectrum, we also circumvent problems 
of conventional basket analysis done by frequent item sets |lHll2llT3lll4) . The latter asks which articles 
are frequently demanded by a single person. This analysis is performed for all articles averaging over 
the entire population of consumers and hence results in the least common denominator of articles 
which may then be bundled together and marketed together to the whole population of customers. 
The same is true for cluster analysis of eBay categories jl]. The proposed network cluster analysis, 
however, reveals information about people and their diverse and possibly very special interests. 

Dataset 

A dataset consisting of over 1.59 million auctions was obtained from the German eBay site www . ebay . de 
ending during the pre-Christmas season December 6 th and 20 th 2004. Considering only articles with 
locations in Germany, we recorded the user-id of seller, buyer, and all bidders competing in each auc- 
tion, as well as the individual bids and the product category in which the article was listed (excluding 
articles listed in the real estate category which was in a beta testing phase at the time). Since auctions 
last between 7 and 10 days depending on the choice of the seller, we thus cover a bidding period of up 
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to 25 days. We believe the pre-Christmas time is a suitable time for analysis for the following reasons: 
First, traffic is very high. In fact, there was a broad advertising campaign in Germany advertising to 
shop for Christmas presents on eBay. Second, we only considered auctions and expect that users are 
unlikely to bid for articles for which they cannot assess a fair price. Third, if users shop for presents, 
then we can gain some information about their family background, e.g. people shopping for toys will 
most likely have a child themselves or among their closer relatives. Our findings indicate that this is 
indeed the case. Table ^ summarizes the dataset in its basic parameters. There are far less sellers 
than bidders and only 38% of the sellers also act as bidders or buyers. This indicates that users are 
split into those mostly selling and those mostly buying. 

User Activity and User Networks 

The activity of the users is measured via the probability mass distributions of the number of articles 
sold p(s), bought (auctions won) p(w), and bid on p(a). Though it is possible to bid multiply in a 
single auction, we neglect this fact and use "bid" and "take part in an auction" synonymously. Similar 
to previous studies [2], we find fat tailed distributions of the user activity in the form p(x) oc x~ K . 
For the number of bidders b taking part in an auction, the "attractiveness of an article", we find 
an exponential distribution q(b) oc a b . Table El summarizes the parameters obtained by maximum 
likelihood fitting for these distributions |15| I16j . Plots of the data can be found in the supporting 
online material. 

The fat tails of the distribution are striking given the short time span observed. Consider the 
most active bidder taking part in over 800 auctions! This user seems to follow a gambling strategy 
bidding only minimal amounts as he/she wins only a few of these auctions. The most successful buyer 
who won 201 auctions on the other hand took part in only 208 auctions. This hints at a diversity 
of strategies employed by users of the online auction site. Curiously, the article most desired and 

Table 1: Summary of the data set of online auctions obtained between Dec. 6 th and 20 2004. 
Numbers in millions. 



auctions observed 



1.59 



users acting as buyer 



0.95 



users acting as seller 



0.37 



users acting as bidder 



1.91 



users acting as seller and bidder 



0.14 



users acting as seller and buyers 



0.08 
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Table 2: Activity distributions of observed users in auctions. Shown are the average values and the 
exponents k of the tail of the distribution {x > 10) if they follow a power law p(x) oc x~ K or the 
parameter a if the distribution has an exponential form q(x) oc a x . All parameter estimates are 
maximum likelihood estimates. 





(x) 


k a 


articles sold per seller: 


4.3 


2.37 


auctions taken part in per bidder: 


2.9 


2.78 


articles bought per buyer: 


1.7 


3.38 


bidders per auction: 


3.4 


0.71 



attracting 39 different bidders was a ride in a red Coca-Cola- Truck. 

From the original data a number of market networks can be constructed, such as the network of 
users connected by actual transactions, or the network of sellers that are connected if they have sold 
to (or received bids from) the same user. Then, the links in the network would represent competition 
or a possibility for cooperation, depending on the portfolio of articles offered by these sellers. 

Here, we focus only on the bidder network based on single articles. Two bidders are linked if they 
have competed in an auction. Since all users that bid in a single auction are connected, this network 
results from overlaying fully connected cliques of bidders that result from each auction. Such graphs 
are also known as affiliation networks |17| I18| I19j . 

Prior to a cluster analysis in this bidder network, we study its general statistical properties looking 
for indications of cluster structure |20| , We compare the results to a randomized null model (RNM) 
obtained from reshuffling the original data, i.e. keeping the attractiveness of each auction and the 
activity of each bidder constant, but randomizing which bidders take part in which auction. 

Furthermore, we compare the bidder network with theoretical predictions we can derive from the 
distribution of the bidding activity p{a) and the distribution of the attractiveness of auctions q(b). 
The degree distribution of the bidder network can be calculated from p{a) and q{b) using a generating 
function formalism |211 I19j assuming that bidders never meet twice in different auctions. With a 
power-law form of p(a) oc a~ K and an exponential form for q(b) oc a b as before, the degree distribution 
in the bidder network amounts to: 

„ ( 2a - 1 + k \ 

p(k) = a k J2p(a)(l-a) 2a . (1) 

a \ k J 

Figure [2] shows a comparison of the empirical data from the bidder network and the theoretical curve 
|T]). The shapes of the distributions agree quite well, given that we use an estimate based on only two 
parameters k = 2.78 and a = 0.71. 
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co-bidders (k) link weight (w) co-bidders (k) 



Figure 2: Comparison of the bidder network with a random null model (RNM). Left: degree distri- 
bution. The solid line represents a theoretical degree distribution according to equation ([I]). Middle: 
distribution of the link weights in the bidder network, (o) for the RNM and (x) for the empirical data. 
Right: Distribution of the clustering coefficient c{k) as a function of the degree k of the nodes, (o) for 
the RNM and (x) for the empirical data. The two dashed lines indicate two power laws oc k~ K with 
exponents k = 1 and k = 0.8, respectively. 

A theoretical expectation for the average number of neighbors in the bidder network is be given 
by (k) = 2((6) — l)(a) = 14 where (b) is the average number of bidders per auction and (a) is the 
average number of auctions taken part in by a bidder. This estimate is in excellent agreement with 
the result from the RNM, but larger than in the actual data, indicating that the probability to meet 
in an auction twice is not zero confirming our expectation. See Table 01 for a summary of the basic 
parameters of the empirical data and the RNM. 

Comparing the distribution of the link weights, i.e. the number of times two bidders have met 
in an auction, we find a much more prominent difference between the data and the RNM. Figure 
121 shows that the weights of the links in the bidder network are distributed with a power law tail. 
Approximately 6% of all links correspond to pairs of bidders which have met more than once. If there 
would be no common interest among bidders, practically all links would have weight 1 as is indeed 
the case for the RNM. 

Additionally to the degree distribution, we compare the distribution of the clustering coefficient 
as a function of the degree of a node. The clustering coefficient c(k) denotes the average link density 
among the neighbors of a node of degree k. Due to the construction process of the network as an 
affiliation network, we expect that for large numbers of neighbors k the clustering coefficient c{k) 
scales as k~ 1 in case of random assignment of bidders to auctions |18| . Figure [21 shows that this is 
indeed the case for the RNM, but the actual data deviates strongly for bidders with a large number 
of neighbors. This effect can arise from two processes: either bidders with whom one competes in two 
different auctions also meet independently in a third auction, or that there is an increased probability 
that one will compete again with a bidder one has already met once in an auction. Both explanations 
support our assumption of the presence of clusters of users with common interest. 
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Table 3: Summary of basic parameters for the bidder network with two bidders linked, if they have 
competed in an auction. Shown are the actual data, the parameters for a random null model (RNM) 
and the reduced version of the network used for cluster analysis. 



data RNM reduced 

number of nodes: 1.8 x 10 6 1.8 x 10 6 0.9 x 10 6 

number of links: 11.6 x 10 6 12.6 x 10 6 7.4 x 10 6 

average degree: 12.9 13.9 16.4 

assortativity: 0.02 0.0 0.03 



With these comparisons, we have shown that the bidder network is far from randomly constructed 
and we will proceed by studying the cluster structure for which we have found indirect evidence 
already. 

Market Segmentation 

Network Clustering 

The analysis of the user interests in the eBay market is based on the bidder network as constructed in 
the previous section. The links in this network represent articles the connected bidders (nodes) have 
a common interest in. We reduce the network to only those bidders that have taken part in at least 
two auctions and we consider only auctions with a final price below 1,000 Euro, thereby focussing on 
consumer goods. See Table |3] for the basic parameters of this reduced network. 

If we now find groups of users (clusters or communities |22| 1231 124j ) with a high density of links 
among themselves and a low density of links to the rest of the network, the total set of links within such 
a group of users can be interpreted as a unifying common interest of this group. We assign the users 
into communities as to maximize a well established quality function known as network modularity Q 
defined by Girvan and Newman (GN) |25j . The definition of Q can also be written as |26j : 

MQ = V (m ss - 7[m ss ]) = - V (m rs - j[m rs ]) . (2) 

— v J — v ■* 

s v s<r " 

Css Q*rs 

Here, the first sum runs over all group indices s, while the second over all pairs of different group 
indices s > r, m ss is the number of internal links in group s and [m ss ] is an expectation value for this 
quantity in case of a random assignment of bidders into groups and is given by [m ss ] = Kg /AM. By K s 
we denote the total number of links emanating from members of group s and M is the total number of 
links in the network. Equivalently, m rs is the number of links between members of group r and s and 
[m rs ] is the corresponding expectation values given by [m rs ] = K s K r /2M |26| . Q is maximal, when 
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the sum of cohesions c ss , defined as the difference between the actual and expected number of within 
group links, is maximal. Equivalently, Q is maximal when there are many less links between different 
groups than expected for a random assignment of nodes into communities, i.e. the sum of adhesions a rs 
is minimal. Note that any assignment of bidders into groups which maximizes Q will be characterized 
by both, maximum cohesion of groups, and minimal adhesion between groups. If Q is maximal, every 
node is classified in that group to which it has the largest adhesion, otherwise it could be moved to a 
different group to increase Q. Additional to the original definition of Q by GN, we have introduced 
a parameter 7 which allows to adjust the relative influence of actually present and expected links in 
the definition (setting 7 = 1 recovers the original definition of GN). Values of 7 smaller or greater 
than one lead to larger or smaller communities, respectively. Comparing classifications obtained at 
different values of 7 allows the detection of hierarchy and overlap in the cluster structure. See Ref. 
|27l I26| for examples and further details of this variation. 

The technical details of how the bidders can be assigned into groups such that Q is maximized 
are given in Refs. |26[ I23| . We allowed for maximally 500 different groups of bidders in our analysis 
which gives a sufficient level of detail. 

The left part of Figure compares the results obtained with 7 = 0.5 and 7 = 1. Shown are the 
adjacency matrices A{j of the largest connected component of the bidder network. A black pixel at 
position and (j, i) is shown on an 889828 x 889828 square if bidders i and j have competed in an 
auction and hence Ay = 1, otherwise the pixel is left white corresponding to Aij = 0. The rows and 
columns are ordered such that bidders who are classified as being in the same group are next to each 
other. The internal order of bidders within groups is random. The order of the groups was chosen to 
optimally show the correspondence between the ordering resulting from the 7 = 0.5 and the 7 = 1 
ordering. In this representation, link densities correspond to pixel densities and thus to grey levels in 
the figure. Information about the exact size and link density contrast of the clusters is given in the 
supporting online material. Note the high contrast between internal and external link density. 

At the top of Figure 01 we show the adjacency matrix ordered according to an optimal assignment 
of bidders into groups with 7 = 0.5. Clearly, a small number of major clusters of bidders and a large 
number of smaller clusters are identified, strongly connected internally and well separated from one 
another. The largest 8 clusters are marked with letters A through H. Of all bidders in the network, 
85% are classified in these 8 clusters. At the bottom, we show the same adjacency matrix, but now 
rows and columns are ordered according to an optimal assignment of bidders into groups with 7 = 1. 
As expected, we find a larger number of smaller, denser clusters which are numbered 1 through 13. 
In order to analyse whether the network has a hierarchical or overlapping cluster structure, we define 
a consensus ordering of the bidders from the 7 = 0.5 and 7=1 ordering by reshuffling the internal 
order of the 7 = 0.5 clusters according to the 7 = 1 clustering. Remember the orderings for the two 
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Figure 3: Left: NxN adjacency matrix of the bid- 
der network in three different orderings. A pixel 
in row i, column j corresponds to an auction in 
which bidder i and j have competed. Shown are 
N = 889828 bidders (nodes) and M = 7373008 pair- 
wise competitions (links) . Grey levels correspond di- 
rectly to link density in this network and hence to 
the probability of competing in an auction. Top: 
7 = 0.5 ordering, bottom: 7=1 ordering and mid- 
dle: consensus ordering of top and bottom. Right: 
Odds ratios of bidding in one of the 32 main eBay 
product categories for classified users. Top: from 
7 = 0.5 classification, bottom: from 7 = 1 classifica- 
tion. Spectra with a dashed background (cluster id 
in parenthesis) show customer purchases 6—9 months 
after original classification. See text for details. 



values of 7 were obtained independently. If the network possesses a hierarchical structure in the sense 
that the clusters obtained at higher values of 7 lie completely within those obtained at lower values 
of 7, then the consensus ordering would not differ from the ordering at 7 = 1. If, however, clusters at 
lower values of 7 overlap and this overlap forms its proper cluster at higher values of 7, the network is 
not entirely hierarchical. These aspects will become immediately clear by looking at the middle part 
of Figure El For clarity, we have marked the borders of the 7 = 0.5 clustering. Clusters 1 and 2 fall 
entirely within cluster A giving an example of a cluster hierarchy. Cluster 3, however, is split by the 
consensus ordering into one part A3 belonging to A, and B3 belonging to B (see arrows in figure). It 
is now clear that clusters A and B actually have some overlap which was not visible in the 7 = 0.5 
ordering. This overlap is concentrated in cluster 3, parts of which belong stronger to either A or B. 
Clusters 4 and 5 then fall again completely within cluster B. Clusters 6 and C are practically identical. 
Cluster D has a number of sub-clusters, the largest of which is 7 and overlaps with cluster E through 
cluster 8 as before (see arrows again). Group E has two more sub-groups 9 and 10 while clusters 11, 
12 and 13 fall entirely within clusters F, G and H, respectively. More details about hierarchical and 
overlapping cluster structures including some toy examples can be found in |26j . 

Cluster Validation, Interpretation and Time Development 

To validate the statistical significance and to rule out the possibility the observed cluster structure 
is merely a product of the clustering algorithm or the particular method of constructing the network 
from overlapping cliques of bidders, we compare the results to those obtained for appropriate random 
null models. Maximizing Q also for the RNM version of the bidder network, again taking into account 
those bidders which took part in at least two auctions, we find a value of Q = 0.28 at 7 = 1 which 
is significantly less than the value of Q = 0.64 for the empirical data. Furthermore, the RNM shows 
all equal sized clusters, while the real network clearly possesses major and minor clusters. A random 
graph with the same number of nodes and links, i.e. disregarding the scale free degree distribution 
and the affiliation network structure of the graph, would yield only Q = 0.23 |26j . 

Until now we have only found groups of bidders that have am increased probability to meet other 
members of their groups in the auctions they take part in. The eBay product categories are now 
used in order to find an interpretation for the common interests that lead to the emergence of the 
cluster structure of the bidder network. Since cluster sizes vary and the number of articles in the 
individual categories is very diverse, we calculate the odds ratios (OR) for bidding in one of the 32 
main categories. This odds ratio is defined as 

P(bidding in C|member of cluster s) 
s P(bidding in C|not member of cluster s) ' 

i.e. the ratio of the odds of bidding in category C, given a bidder is member of group s vs. the odds of 
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bidding in category C given the bidder is member of any group r ^ s. The right hand side of Figure 01 
shows a graphical representation of the odds ratios for clusters A through H and most of the clusters 
1 through 13. All spectra are normalized. The exact numerical values can be found in the supporting 
online material. Clusters from the 7=1 assignment are more specific with less entries in the category 
spectrum and larger ORs. 

Cluster A unites bidders interested in articles listed in the baby, beauty, fashion, books, movies 
and music category. Cluster 1 then represents a more specifically content oriented user group mainly 
interested in books, movies and music. As we have seen, cluster 1 is an almost complete sub-cluster 
of A. Cluster 2 is also a complete sub-cluster of A and encompasses bidders mainly interested in 
cosmetics and fashion. 

Cluster B contains two sub-clusters 4 and 5, both annotated in the toy model category. Closer 
inspection, however, reveals that cluster 4 is mainly characterized by its interest in model railways 
while the bidders in cluster 5 have a passion for model cars, radio controlled models, slot cars and the 
like. Note the advantage of clustering based on single articles. The clusters we find with one simple 
unbiased method combine top level categories as in the case of cluster 1 or can only be described be 
resorting to sub-categories as in the case of clusters 4 and 5. From the left part of Figure EJ we had 
observed that cluster 3 is responsible for a large part of the overlap between clusters A and B. We 
see that users in this group 3 have their main interests in the baby and toy category. The overlap of 
cluster A and B is hence mediated via the toy category. Members of cluster A and B mainly meet in 
toy auctions. The interpretation of the other clusters is then equally straightforward. 

Bidders in clusters C and the practically identical cluster 6 take interest in audio equipment and 
instruments. Cluster D represents bidders with an inclination to collecting, their bids being placed 
in the antiques, jewelry, stamps and coins category (cluster 7). The bidders in cluster E are mainly 
shopping for technological gadgets, computers, consumer electronics, software, mobile phones, PDAs 
etc. (clusters 9 and 10). Their overlapping interest with bidders from cluster D is in items from the 
photo category (cluster 8). In groups F and 11, we find predominantly practically oriented users who 
place their bids mainly in the categories of automotive spare parts, business and industry (where a lot 
of tools and machinery are auctioned) and do-it-yourself. Finally, in groups G and 12 we find event 
oriented customers with strong bidding activity in the tickets and travel category and in group H and 
13, we find people bidding on sports equipment. 

Let us now focus on the time development of the user interests. The data for this analysis was 
collected during only a relatively short time span (25 days) and we base our results on an extremely 
sparse data set. Remember that every bidder in the network took part in only 3 auctions on average. Is 
it really possible to predict meaningful patterns of consumer interest from such sparse data? One could 
further argue that the few most active bidders account for a large portion of the bids, thus holding the 
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network together and "denning" the clusters of interest, because they also contribute a large number 
of links. In order to address this question, we revisited the data set in beginning of September 2005, 
more than nine months after our original study. From the 6 largest clusters of the 7=1 ordering, we 
uniformly and randomly sampled 10, 000 users each. Note that this removes possible bias towards very 
active users, they are now represented in the data according to their proportion in the population. 
Then we looked at the trading history of these users as far back as eBay permits - 90 days. For these 
60, 000 users, we determined the product categories of the articles they had bought between June 
and September. Again, we calculated the odds ratios, this time of buying, i.e. winning an auction, 
from a particular category and with the new sample of users as basic population. The results are 
shown on the right hand side of Figure El with a dashed background and the cluster id from which 
the users were sampled in parenthesis. The stability of the interest profiles is quite remarkable. The 
main interests have remained unchanged as compared to the initial study though in some cases the 
spectrum has become more diverse. For instance the content oriented bidders of cluster 1 now also 
show increased buying activity in the PC-games and tickets category. At the same time the main 
interest has shifted from movies to music. The largest number of product categories with increased 
odds of bidding in this category is found for cluster 9, the members of which are the most technology 
affine users anyway and which would be expected to satisfy a very broad range of consumer needs 
from online vendors. The members of cluster 7 (the collectors) and cluster 4 (the toy model builders) 
are much more conservative and almost do not change their profile at all. Without second hand data 
about the age structure of the bidders classified, we can only speculate that these clusters are formed 
by older customers who tend to stick to particular categories. 

Conclusion 

We have presented a detailed study of the user behavior on the online auction site www . ebay . de 
during the pre-Christmas season of 2004. Power-law distributed activity in terms of the number of 
articles sold, bought, and bid on was found. The attractiveness of articles, measured in terms of 
the number of bidders participating in an auction, shows an exponential distribution. Focussing on 
the bidding behavior, we constructed a network of bidders from their competition for single articles. 
Nodes in the network correspond to bidders and links to the fact that these bidders have expressed a 
common interest in at least one article. Studying the general statistical properties and comparing to 
appropriate random models, we find clear indications for a non trivial cluster structure. This cluster 
structure, its hierarchy and overlap was studied using a community detection algorithm. Our analysis 
did not need the definition of any kind of similarity measure between articles or product categories. 
Rather, we solely used the taxonomic information about articles provided by eBay to interpret our 
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results. We can classify 85% of the users into only a small number of well separated, large clusters, all 
of which have a distinct profile of only a few main interests as revealed by annotating the articles in the 
taxonomy of product categories. Some of the clusters show sub-clusters or overlap with other clusters. 
The interest profiles we identified are remarkably stable. Sampling randomly from the clusters and 
checking, what these users bought during a three month period in the summer 2005, we found that 
the profiles of articles bought were almost identical to those from the classification 6 months earlier. 

This is striking because virtually everything is offered on eBay and one would expect users to 
satisfy a much broader range of shopping interests. However, it appears that the major clusters 
mainly correspond to people's favorite spare time activities. We believe the apparent stability of 
user's buying and bidding behavior reflects the permanence of their interests which is also stabilized 
by their social environment and activities. The clear signature in the market data may stem from 
the fact that users tend to buy online only articles where they have some experience and expertise 
in. Users seem hesitant to bid on articles from categories in which they have not previously bid in. 
This may be due to the fact that inexperienced users cannot judge what is a fair price for an article 
in an auction and they have difficulty assessing to what extent the article offered really suits their 
needs. At the same time, user's interests are reinforced by online recommender systems |28l 129) . which 
suggest similar articles to those already bought by the user. This temporal stability corroborates the 
hypothesis that the presence of latent interest profiles in the society per se leads to the emergence of 
user groups with common interest. Transparent markets such as online auction sites in which users 
act independent and anonymously are perfect starting points for research into this collective behavior. 
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Supporting Material 



User Activity 




# of articles sold per seller (s) # of articles bid on per bidder (a) 




# of articles bought per buyer (w) bidders per auction (b) 



Figure 4: User activity during the pre-Christmas season 2004. From top left to bottom right: prob- 
ability mass function of the number of articles sold p(s), different auctions participated in p{a) and 
number of articles bought p(w), and number of bidders participating in an auction q(b). For the first 
three curves, the solid lines correspond to maximum likelihood fits of the tail of the distribution for 
s,a,w > 10 with power-laws p(x) oc x~ K and exponents k s = —2.37, K a = —2.78 and k w = —3.38, 
respectively. For the distribution of the desirability of an article q(b), the dashed line represents a 
maximum likelihood exponential fit of the form q(b) oc a b with a = 0.71. 

The distribution of the number of articles sold per seller falls off slowest, followed by the number of 
articles bid on and the number of articles bought. Here, we see the professionalization on the seller 
side of the market. There are "power-sellers" making a living from selling via eBay, but there are 
hardly any "power-buyers" professionally buying on eBay. This shows that eBay is more of a selling 
platform than an actual trading site, where selling and buying activities would be more balanced. If 
we assume that the tail of the distribution of the number of articles sold per seller is representative for 
the "firm size" of these users and compare these to the long term statistics of firm sizes in the US given 
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by Axtell |30j . we can confirm the power law tail of the distribution, but not the exponent of n = 2. 
Instead, we find k = 2.37 and thus the observed distribution falls off faster. We can only speculate on 
the reasons for this and further study is needed here to compare new and old economy. In an earlier 
study, Yang et al. had reported an exponent of k = 3.5 for the distribution of the number of auctions 
a bidder takes part in from a data set obtained in 2001 |2j and we found k = 2.78 in our data. If this 
discrepancy is the result of a trend and not due to the differences in the observed countries and sizes 
of the data set, and this trend holds also for the distribution of the seller's activity, then one may be 
able to observe a convergence towards the exponent of k = 2 known from the old economy. 
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Cluster Parameter 



Cluster 


N 




{kout} 


Pin 


Pout 


A 


200630 


10.2 


3.4 


5.1E-05 


5.0E-06 


1 


84699 


10.3 


4.0 


1.2E-04 


5.0E-06 


2 


29323 


9.0 


5.2 


3.1E-04 


6.0E-06 


3 


76182 


10.1 


4.1 


1.3E-04 


5.0E-06 


B 


102188 


18.6 


3.9 


1.8E-04 


5.0E-06 


4 


44830 


24.6 


4.2 


5.5E-04 


5.0E-06 


5 


26325 


14.2 


5.2 


5.4E-04 


6.0E-06 


C 


19915 


14.1 


4.3 


7.1E-04 


5.0E-06 


6 


20020 


14.5 


4.3 


7.3E-04 


5.0E-06 


D 


124702 


16.5 


3.8 


1.3E-04 


5.0E-06 


7 


74913 


17.2 


4.1 


2.3E-04 


5.0E-06 


8 


41359 


16.8 


5.9 


4.1E-04 


7.0E-06 


E 


183313 


15.4 


4.2 


8.4E-05 


6.0E-06 


9 


73722 


13.4 


6.5 


1.8E-04 


8.0E-06 


10 


47937 


17.5 


5.9 


3.7E-04 


7.0E-06 


F 


74657 


10.5 


4.9 


1.4E-04 


6.0E-06 


11 


62115 


11.1 


5.0 


1.8E-04 


6.0E-06 


G 


31337 


11.0 


6.0 


3.5E-04 


7.0E-06 


12 


18835 


11.8 


6.1 


6.3E-04 


7.0E-06 


H 


19620 


10.0 


4.4 


5.1E-04 


5.0E-06 


13 


18286 


9.9 


4.4 


5.4E-04 


5.0E-06 



Table 4: Summary of basic parameters for the major communities found in the bidder network (an- 
notated as in Figure 3). N denotes the number of bidders in the cluster, (ki n ) and (k ut) the average 
numbers of neighbors within the cluster an in the rest of the network, respectively. By pi n and p ou t 
we denote the internal and external link density, respectively. The average link density in the network 
is (p) = 1.9 x 10" 5 . 
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Category 


A 


1 


2 


3 


B 


4 


5 


C 


D 


7 


8 


E 


9 


10 


F 


11 


G 




]VIusic 


5.6 


8.6 (12.0) 












1.5 






















Books 


2.7 


3.4 (2.5) 




(1 1) 






























Miovies 


12.1 


22.1 (9.2) 


































Beauty 


5.6 


(1 5) 


29.3 


(1 3) 


















(1 7) 












Baby 


13.7 






41.0 (20.0) 






























Tovs 


1.3 






14.2 (4.9) 


5.0 


(1 6) 


1.1 
























Toy Models 










70.2 


21.9 (28.0) 


10.2 
























Audio 




(1 3) 












16.2 






1.2 


3.4 


2.1(3.9) 




1.0 


1.4 (2.3) 






Instruments 




(1 5) 








(1 9) 




144.3 










(2 2) 












Antiques 










1.2 


1 8 (1 1) 






6.6 


6.3 (3.7) 


















Stamps 


















88.6 


118.0 (18.0) 


















Coins 


















66.7 


88.3 (18.0) 


















Collectibles 


















1.6 


1.7 (2.6) 


















Jewelry 


1.1 




1.3 












4.5 


5.3 (3.1) 


















Photo 












(1 2) 






4.4 




47.9 


2.8 


1.5 (3.2) 






(1 4) 






Computer 
























19.6 


14.7 (12.0) 


3.1 




(1 4) 






Cons. Electr. 
























6.7 


16.1 (4.3) 




2.0 


2.5 (5.9) 






Mobile 
























25.2 


1.3 (4.7) 


33.0 




(2 11 






Games 




(2 2) 




(1 1) 
















3.3 


2.3 (3.4) 


1.4 










Software 




(I 2) 


















1.5 


5.1 


2.7 (4.0) 






(1 5) 






Business 


























(1.1) 




21.5 


16.9 (15.0) 






DIY 


























(2.2) 




17.9 


19.1 (7.0) 






M^otors 














1.4 










1.5 


(1.6) 




8.6 


8.2 (8.9) 






Travel 




(1 2) 
















(1 7) 






(1.4) 








58.2 




Tickets 




(2 7) 




(1 7) 


















(1.7) 








42.9 




Sports 




(1 3) 




(1 3) 


















(2.0) 






(1 9) 


1.4 
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Office 




(1.1) 




















2.3 


1.1(2.4) 




1.9 


1.6 (2.0) 






Deli Food 




(1.2) 






















(1.5) 






(2.0) 


9.0 




Household 
























3.7 


3.3 (4.1) 




1.1 


(1.8) 


1.8 




Fashion 


1.9 




2.7 


1.7 (4.1) 


















(1.2) 








1.1 


1 


Furniture 




(1.0) 


1.1 


(1.6) 










1.3 








(1.6) 








4.2 




Animals 




(1.2) 


1.0 


(1.3) 
















2.0 


(1.9) 






(1.4) 







Table 5: Odds Ratios for bidding in one of the 32 main categories during the pre-Christmas season 2004. Shown are only values above 1 signifying an 
increased interest in articles from this category. Values larger or equal 5 are set in bold font. Not how the clusters from the 7 = 1 clustering are more 
specfic than those from the 7 = 0.5 clustering, e.g. there are less categories with an OR larger 1 and those that are deviate stronger from 1. Also 
note how the overlap between cluster A and B is mediated via the toy category and the overlap between cluster D and E via the photo category. For 
the six largest clusters of the 7 = 1 clustering, we show the OR of buying from the corresponding category during the summer 2005 as explained in 
the text. 



